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Abstract. Atomicity (or linearizability) is a commonly used consistency 
criterion for distributed services and objects. Although atomic object im- 
plementations are abundant, proving that algorithms achieve atomicity 
has turned out to be a challenging problem. In this paper, we initiate 
the study of systematic ways of verifying distributed implementations 
of atomic objects, beginning with read/write objects (registers). Our 
general approach is to replace the existing operational reasoning about 
events and partial orders with assertional reasoning about invariants and 
simulation relations. To this end, we define an abstract state machine 
that captures the atomicity property and prove correctness of the object 
implementations by establishing a simulation mapping between the im- 
plementation and the specification automata. We demonstrate the gen- 
erality of our specification by showing that it is implemented by three 
different read/write register constructions: the message-passing register 
emulation of Attiya, Bar-Noy and Dolev, its optimized version based on 
real time, and the shared memory register construction of Vitanyi and 
Awerbuch. In addition, we show that a simplified version of our specifi- 
cation is implemented by a general atomic object construction based on 
the Lamport's replicated state machine algorithm. 



1 Introduction 

Many distributed and network-based services can be modeled as shared objects 
accessible to (possibly remote) clients through well-defined interfaces. Atomicity 
[16, 21] (also known as linearizability [10]) is a desirable property for such objects 
as it allows clients using the objects to perceive the operations that occur in each 
run as occurring atomically, in some sequential order. This perception makes it 
easier to understand the behavior of a system using distributed services, and so, 
simplifies the task of system design. 

Atomic services could be implemented simply on single server machines. How- 
ever, to achieve high availability in a distributed system and to tolerate failures, 
atomic services are typically implemented by distributed algorithms. Many dis- 
tributed algorithms have been proposed for implementing atomic objects; see, 
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for example, [17, 15, 36, 27, 33, 32, 10, 35, 14, 19, 18, 22, 23, 8]. These use a range 
of techniques to achieve the appearance of total ordering, for example, assign- 
ing timcstamps and processing operations in timestamp order, or using quorum 
configurations. 

Although atomic object implementations are abundant, proving that algo- 
rithms achieve atomicity has turned out to be a challenging problem. Most exist- 
ing proofs for such algorithms arc long, subtle, and difficult to understand and 
check. As evidence of the difficulty, we note that several published proofs for 
implementations of atomic shared read/write memory objects have later been 
shown to be incorrect. We believe that a fundamental reason for the difficulty of 
these proofs is their style: they are based on detailed, not- very-systematic, rea- 
soning about events and their ordering. Useful structure in such proofs is often 
provided by lemmas about partial orders of operations on objects, for example, 
Proposition 3 of [16] (for single-writer read/ write objects) and Lemma 13.16 of 
[21] (for multi- writer read/ write objects). These lemmas provide sufficient con- 
ditions for correctness of atomic read/ write object implementations, based on 
a list of properties that a partial ordering of operations must satisfy. However, 
showing that these properties hold still requires detailed, ad hoc reasoning about 
events (see, e.g., [22,23]). 

In this paper, we study systematic ways of verifying distributed implemen- 
tations of atomic objects, beginning with read/ write objects (registers). Our 
general approach is to replace operational reasoning about events and partial 
orders with assertional reasoning about invariants and simulation relations. The 
assertional methods differ from the traditional operational arguments in two 
important ways. First, the system properties are stated precisely in terms of 
predicates over the system state components. Second, assertional proofs can be 
checked by examining individual state transitions of the algorithm without rea- 
soning about entire executions. As such they lend themselves to mechanization, 
i.e., the process of checking a proof can be carried out using interactive tools, 
such as theorem provers. 

Our approach to carrying out assertional atomicity proofs is first to define 
an abstract state machine that captures the atomicity property and then, prove 
correctness of the object implementations by establishing a simulation mapping 
between the implementation and the specification automata. The challenge is 
to find a specification automaton that is general enough to apply to many ex- 
isting implementations, and at the same time sufficiently close to the actual 
implementations to simplify the task of finding the mapping. One example of 
an atomicity specification that turned out to be too abstract for carrying out 
simulation proofs is the canonical atomic object automaton of Section 13.1.2 
of [21]. The canonical object automaton maintains a buffer used to store incom- 
ing client requests. Buffered requests can later be applied to the object state, 
and the generated responses are returned to their originators. Unfortunately, 
this specification, though simple, does not provide sufficient detail to allow for 
easy match with concrete implementations. 



We therefore, give more detailed specifications. Namely, we define an ab- 
stract state machine, which we call the Partial-Order Machine (PO-Machine), 
which records information about operations and their orders in its state. The 
PO-Machinc expresses the common behavior of many existing atomic register 
implementations, in which client operation requests are gradually ordered rela- 
tive to other operation requests until all the necessary ordering constraints are 
achieved. The ordering constructed is, in the limit, guaranteed to be a partial 
order of the requested operations that satisfies sufficient conditions for showing 
atomicity. 

We use the PO-Machine as a formal specification for distributed algorithms 
that implement atomic memory. We show that it is implemented by three dif- 
ferent read/write register constructions: the message-passing emulation of At- 
tiya, Bar-Noy, and Dolev (ABD) [3] (extended to handle multiple writers as in 
[23]), an optimized version of ABD that takes advantage of synchronized clocks 
at writers [8], and the unbounded version of the shared memory construction 
of a multi-writer/multi-reader register from single-writer/single-reader registers 
of [36]. We also show that a slight modification of the PO-Machinc, called the 
TO-Machine, can be used to prove atomicity of a general (i.e., not necessarily 
read/write) object implementation based on the replicated state machine proto- 
col of Lamport [15]. 

We specify the PO-Machine and the algorithms formally using the I/O Au- 
tomata (IOA)[20] and Timed IOA [12,11] models, in fact, using formal spec- 
ification languages that have been defined for these models. The IOA/TIOA 
specification languages lead to very stylized assertional proofs for invariants and 
simulation relations that can be partially automated using theorem provers. 
Moreover, the same IOA specifications can be used by the IOA compiler [31, 30] 
to produce executable Java code. 

Other related work: Our use of a partial order automaton as an abstract spec- 
ification was inspired by prior work of Fckcte ct al. on specifying the behavior 
of an Eventually Serializablc Data Service [9]. Their specification used a (dif- 
ferent) partial-order machine, which expresses weaker consistency requirements 
than atomicity. The algorithm studied in [9], based on an earlier algorithm of 
Liskov et al. [13], was shown to achieve this weaker form of consistency. 

The only other published simulation-based atomicity proofs we are aware of 
are those of Bogdanov [5] (replicated state machine), and Doherty et al. (lock- 
free queue) [7]. The proofs in both these papers are complicated: They involve 
multiple levels of asbtraction as well as both forward and backward simulations. 
In contrast, every construction considered in this paper is shown to be atomic 
by exhibiting a single forward simulation directly from the implementation au- 
tomaton to a specification automaton. 

Another example of using assertional reasoning for proving atomicity is the 
work by Wang and Stoller [37] , which uses static analysis combined with model 
checking to verify atomicity of code blocks involving lock-free synchronization 
primitives. A more general discussion of assertional proof techniques can be 
found in [281. 



The rest of the paper is organized as follows: In Section 2, we introduce 
preliminary definitions and notation used throughout the paper. The sufficient 
condition for proving atomicity is specified in Section 3. The PO-Machine is 
described in Section 4. The ABD algorithm is presented and proved correct in 
Section 5. A time-based version of ABD is discussed in Section 6. Section 7 
briefly discusses the proofs of the Vitanyi-Awerbuch's register construction, and 
of the Lamport's replicated state machine. Section 8 discusses future directions. 
For lack of space, we only outline intuition and highlight basic ideas underlying 
the correctness proofs. The detailed proofs can be found in the full version of 
the paper [6]. 

2 Preliminary Definitions 

We use the I/O Automata (IOA)[20] model to formally specify services, describe 
algorithms and carry out proofs. An I/O automaton is a non-determenistic state 
machine whose state can change atomically through a discrete transition labeled 
by a discrete action. The set of the automaton's actions is called the action 
signature of the automaton. The actions can be cither external or internal. The 
external actions, which can be either input or output, model interaction with 
the automaton's environment; and the internal actions model local computation 
steps. In Section 6, we also use the Timed I/O Automata (TIOA) model [12, 
11], which, in addition to discrete transitions, also allows the automata state to 
evolve by trajectories, which describe evolution of the state over time. 

We use forward simulations to carry out atomicity proofs. Informally, a for- 
ward simulation is a relationship between the states of two automata requiring 
that the transitions of one system can in some sense be mimicked by the other. 
A precise definition of the simulation formalism can be found in [21]. 

The read/write service A read/write object (a register) type consists of the 
following components: (1) an arbitrary set of values V with an initial value vq, 
(2) the set of operations of the form write(v), v E V, and read, (3) the set of 
responses are ack and v € V, and (4) the sequential specification / such that 
/ (w , write(v)) — (v,ack) and f(w,read) — (w,w). 

A read/write service implements a shared read/write register. To access the 
service, a client issues an operation descriptor consisting of a location identifier 
loc, and an operation identifier id. In addition, the write operation descriptor 
also contains a value val. We often refer to an operations descriptor x simply 
as operation x, and denote its various components by x.loc, x.id, and x.val. We 
denote by O w and O r the sets of the write and the read operations respectively, 
and by O — O w U O r the set of all operations. For a set X C O, we denote by 
X.id = {x.id : x E X} the set of identifiers of operations in X. 

Clients use the actions of the form request(x), x E O, and response(a;, v), x E 
O, v E VU {ack}, to issue operation requests and receive responses respectively. 
Given a sequence (3 of the request and response actions, an requested operation 
x is said to be complete in (3 if (3 contains response(a;, v) for some v E V U {ack} 
which we call the return value of x. 



We say that (3 is well-formed if there exists a function cause mapping each 
response event to a preceding request event in (3 so that the following is satisfied: 
(1) For each response event e = response(a;, *), cause(e) = request(x) (i.e., re- 
sponses are not spuriously generated); and (2) cause is one-to-one (i.e., responses 
arc not duplicated) 1 . 

The following definition will be used throughout the paper: Let 77 be a set of 
read and write operations, and R be a binary relation over 77. For an operation 
it G 77 wc define last-prec-writes^ir, R) = {lu G O w : (lu,tt) G 7? A /Qlu' G O w : 
(w,u/) G RA{u)',n) G 77}. 

3 Atomicity 

Atomicity (or linearizability) is specified as a property satisfied by the object 
implementation traces. It is typically defined in terms of the existence of serial- 
ization points for operations so that shrinking the operations to occur at their 
serialization points results in a valid sequential execution of the read/ write reg- 
ister (see, e.g., Chapter 13 of [21], Chapter 9 of [4], or [10]). For our purposes in 
this paper, it is enough to give a sufficient condition for proving atomicity; this 
condition is equivalent to the one in Lemma 13.16 of [21]. 

Let (3 be a well-formed sequence of the actions of the read/write service 
interface that contains no incomplete operations, and 77 be the set of opera- 
tions requested in (3. We say that j3 satisfies Partial Order property (henceforth, 
referred to as PO) if there exists an irreflexive partial ordering -< of all the 
operations in 77, satisfying the following: 

Property 1 (PO Constraints) 

7. If the response event for it precedes the request event for </> in (3, then (f> -fc ir. 

2. For any two write operations ir and (f> in II, either ir -< <ft or (f> -< ir. 

3. If tt is a write operation in U and <fi is a read operation in II whose request 
event follows the response event for tt, then w -< (f>. 

4- If ft is a read operation in H and 4> is a read operation whose request event 

follows the response event for n, then for each to G last-prec-writes(ir ■, -<) , 

u> -< 4>. 
5. Let ir be a read operation in II , and v be the value returned by ir. If 

last-prec-writes (tt , -<) ^ 0, then v — ui.val for some 

lu G last-prec-writes (tt, -<,). Otherwise, v — Vq. 

The following lemma is proved in [6] : 

Lemma 1. (3 satisfies PO iff there exists an irreflexive partial ordering of all 
the operations in II , satisfying the (more restrictive) constraints of Lemma 13.16 

of [211. 

From the above result and Lemma 13.16 of [21], we obtain: 

Lemma 2. If f3 is well-formed and satisfies PO, then [3 satisfies atomicity. 

1 Note that our notion of well formedness is weaker than that usually found in the 
literature as it allows requests from the same location to be issued concurrently. 



4 The PO-Machine 

In this section wc define the Partial-Order Machine. First, we formally specify 
the environment assumptions of the read/write service. This environment is 
represented by a single automaton, called Users, whose code could be found 
in [6] . The Users automaton contains a single variable requested to keep track 
of the ids of requested operations, in order to avoid repeats. An implementation 
of the environment would not have such a variable, but would use some other 
mechanism to ensure unique operation ids (e.g., client id and a counter). 

Lemma 3. For x, y € requested, x = y <=> x.id = y.id. 

The PO-Machinc signature and state variables appear in Figure 1, and its 
transitions appear in Figure 2. This automaton maintains a partial order in 
its state, represented by variables vertices and edges. Vertices correspond to 
requested operations, and edges to ordering relationships that have been deter- 
mined for these operations. When a request arrives, it is put into vertices; later, 
it becomes classified as ordered, then completed, and finally, responded. Edges 
may be added at any time from ordered write operations to unordered ones (see 
action add-edge). 

An unordered operation tt may become ordered at any time after it has 
acquired incoming edges from all write operations that completed before 7r began 
(i.e., all writes in prec(ir)). This ensures that constraints 1 and 3 of Property 1 
hold among all writes, and between writes and reads. Constraint 1 is also trivially 
preserved among reads as edges originating at read requests are disallowed by 
the PO signature (see Figure 1). When a write operation 7r becomes ordered, 
new edges are inserted to ensure that 7r is ordered with respect to all previously- 
ordered write operations (see action order) so that constraint 2 of Property 1 is 
satisfied. 

An ordered operation may become completed at any time; when a read oper- 
ation <fi completes, it also forces each write operation 7r immediately preceding cf> 
in the partial order to complete. This ensures that every read operation invoked 
after 4> completes will find ir in its prec set, and will therefore, become ordered 
only after it has an incoming edge from 7r. This guarantees that constraint 4 of 
Property 1 is satisfied, and also captures the essence of the "helping" mechanism 
found in many atomic register implementations. 

A completed operation is allowed to return a response. The response returned 
by a read operation is the value written by the last preceding (in the partial 
order) write operation, or the initial value if no such write exists (see action 
response). Thus, constraint 5 of Property 1 is satisfied. 

In [6], we prove that the limit of the transitive closure of (vertices, edges), 
maintained in the derived variable dag, satisfies Property 1. Since every trace 
of PO-Machine is obviously well-formed, by Lemma 2, PO-Machinc implements 
an atomic register: 

Theorem 1. Each trace of the PO-Machine satisfies atomicity. 



Signatur 



request(x). x G O 



Output: 

response(x, v), x G O. 
v e V U {ack} 



add-edge(x, y), x G O w , y g O w U O r 
order(x), x G O 
complete(x), x G O 



vertices C 0, 
ordered C O, 



ally empty 
ally empty 



completed C 0, initially empty 



responded C 0, initially empty 
edges C O X 0. initially empty 
prec is a partial function from O to 
initially empty 



Derived vars: 

dag, the transitive closure of {vertices , edges) 
For x G O r . /asf-t<jrj£es(x) = last-prec- writes (x , dag) 



Fig. 1. PO-Machine signature and states 



Input request(x) 






Internal comp!ete(x) 


Effect: 






Precondition: 


v ertices := v ertices U 1 x \ 






a; G OT"der-ed — complct<:,d 


pre.c(x) : = completed PI 0iu 






Effect: 

completed := completed U I 37 } 
if a; G O r then 


Internal add-edge(x, y) 






Vy G last-writes (a;) do 


Precondition: 






completed := completed U {y} 


y G vertices — ordered 








x G ordered 








Effect: 






Output response(x, ack), x G Oiu 


edges := edges U {(a;, y)} 






Precondition: 

a; G completed — responded 
Effect: 


Internal order(x), x G 0™ 






responded := responded U {x} 


Precondition: 








a; G vertices — ordered 








Vy G precfz) : (y, a;) G dag 






Output response(x, ijq), a: G Or 


Effect: 






Precondition: 


edges := edges U {(a;, y) : y 


G ordered H 0™ A 


a; G completed — responded 


(y,- 


r) £ 


dag} 


last- writes (x) = 


ordered :— ordered U {a:} 






Effect: 

responded := responded U {x} 


Internal order(x), a; G r 








Precondition: 






Output response(x, w), x G Or 


x G vertices — ordered 






Precondition: 


Vy G prec(x) : (y, x) G dag 






x G completed — responded 


Effect: 






last-writes(x) =£ 


ordered := ordered U {a:} 






U = iu.i;al : u; G last-writes(x) 
Effect: 

responded : = responded U {x} 



Fig. 2. PO-Machine transitions 



5 The Attiya, Bar-Noy, and Dolev Algorithm 

In this section, we present a distributed wait-free implementation of an atomic 
multi-writer/multi-reader register based on the well-known message-passing al- 
gorithm of Attiya, Bar-Noy, and Dolev [3] (which we call ABD). We prove cor- 
rectness of ABD by showing that ABD implements PO-Machine, which by The- 
orem 1, implies that ABD implements an atomic register. 

The original ABD protocol implements a wait-free atomic read/write regis- 
ter using a collection of n processes communicating among themselves through 
reliable point-to-point channels. The implementation is resilient to up to n/2 pro- 
cess crashes. Each process in ABD is responsible for both: handling the client 
operation requests, and storing and updating the local copy of the register value. 



Here, we present a generalized version of ABD where we let the two roles in 
the ABD protocol be performed by two classes of agents: clients and replicas. 
This design allows for flexibility in assigning roles to actual network locations 
thus simplifying the algorithm deployment in real systems. We also use a sepa- 
rate client to handle each user request so that the actual clients can handle any 
number of requests and in whatever order (for example, requests can be par- 
titioned among several threads, or executed sequentially). Our implementation 
also supports multiple writers using the technique of [23] . 

We now describe the ABD implementation (the ABD automaton) in more 
detail. Let P be a finite set of replicas. We define a quorum system Q on P to 
be the union of a set of write quorums Q w and the set of read quorums Q r . 
Q w and Q r are sets of subsets of P such that for each Q w £ Q w and Q r £ Q r , 
Qw H Q r 7^ 0- The ABD automaton is the composition of the Users automaton 
of Section 4, the client automata C x , x £ O, the replica automata R p , p £ P, 
and the reliable point-to-point channel automata connecting each client C x with 
replica Rp and vice versa. The client's interface and state variables appear in 
Figure 3. The code of the reader client, the writer client and the replica appear 
in Figures 4, 5, and 6 respectively. We do not present the specification for the 
channel automata as their functionality is obvious. 

The value stored at each replica is associated with a tag. Tags are two-field 
records consisting of a sequence number sn, which is a non-negative integer, and 
a request identifier id. Tags are ordered lexicographically with the precendence 
to the sequence number field. 

Clients access read (resp. write) quorums by first sending a message to all 
the replicas, and then awaiting responses from a write (resp. a read) quorum. 
The request handling at clients involves two rounds of quorum accesses, called 
the read phase and the write phase respectively, such that a read quorum is 
contacted during the read phase, and a write quorum is contacted during the 
write phase. A client keeps track of the request progress through the phases 
using the variable status. The operation's status is initially idle. It is changed 
to pending (p) at the beginning of the read phase. It becomes sending (s) at the 
beginning of the write phase. It is changed to committed (c) upon completion of 
the write phase, and finally to responded (r) after a response is returned. 

Specifically, to handle a write request x, the client C x (see Figure 5) performs 
a read phase to determine the highest tag t associated with the values stored at 
some read quorum. It then performs a write phase to store the value v associated 
with tag (t.sn,x.id) at a write quorum. It then responds with ack. To handle a 
read request y, client C v (see Figure 4) first performs a read phase to determine 
the value v associated with the highest tag t among those associated with the 
values stored at some read quorum. It then performs a write phase to guarantee 
that the pair (t, v) is stored at a write quorum. It then responds with v. 

The replica's algorithm (see Figure 6) is simple: In response to a read phase 
message, a replica p either responds with its current tag (for write requests), or 
the current tag and the value (for read requests). In response to a write phase 
message carrying a tag which is bigger than p's current tag, p overwrites its 



current tag and the value with those in the message. Otherwise, the p's state is 
left unchanged. In both cases, p responds with ack. 



Tag = M — X O.id, with selectors sn and id, ordered lexicographically 
Phase = {idle, p, s, c, r}, ordered so that idle < p < s < c < r 



Signature: 

request (a;) 

'e(m) p>x , P e P, m G {ack} U 
A^^° U (Tag X V) 



Internal: 

rq-collectedfg)^, q e Q r 
wq-collectedtqja; , q G Qti 



Output: 

response(a;, v) , v G V U {ack} 
send(m) XfP , p G P, m G {r, w } U 
(Tag X V) 



State: 

status G Phase., initially idle 
val G V, initially undefined 
tag e Tag, initially (0,i ) 



read-resp G P, initially empty 
write-resp G P, initilly empty 

for each p & P: req-buffer p G seqof({ r , u,} U 
(Tag X V)), initially A 



Fig. 3. The state and signature of client automata C x , x € O for ABD. 



Input request(x) 
Effect: 

for each p G P: 

append (r) to req-buffer p 

Input receive^, t) PiX 
Effect: 

raad-rasp := read-rasp U {p} 
if status = p A t > tag then 

uai := ti 

tag := t 

Internal rq-collectedfq)^ 
Precondition: 

status = p 

read-resp 3 g 
Effect: 

status := s 

for each p £ P: 

append {tag, val) to req-buffei 



Input receive(ac/E) P):E 
Effect: 

■ivrita-resp := write-rasp U {j 


'} 


Internal wq-collectedfg)^ 
Precondition: 

status = s 

writa-resp ~D q 
Effect: 




status := c 




Output response(a;, u) 
Precondition: 




status = c 




val = v 
Effect: 




status := r 




Output sendtm)^ p 
Precondition: 

req-buffer p jL A 

m = head(req-buffer p ) 
Effect: 

delete head of req-buffer p 





Fig. 4. Transitions of reader C^, a; G O r for ABD. 



Correctness of ABD: We now prove that ABD implements an atomic register. 
Our strategy will be to show that ABD implements PO-Machine by exhibiting a 
forward simulation from ABD to PO-Machinc. In the following, for each x <G O, 
we will use subscript x to refer to the state variables of C x . It is convenient 
for the ABD correctness proof to define several derived variables for the ABD 
automaton. These are summarized in Figure 7. 



Input reque5t(x) 
Effect: 

status := p 

for each p g P : 

append (iu) to req-buffei 



Input receivetsrOp^, sn. G A^ 1 ' 
Effect: 

read-reap := read-rasp U {p} 
if status = p A sti > tag.. 
taff.sn := sn 



Internal rq-collectedfq)^ 
Precondition: 

status = p 

read-resp 3 g 
Effect: 

status := s 

tag-sn := tag. sti + 1 

for each p G P: 

append {tag, x.val) to req-buffet 



Input receive(acfc)p );c 
Effect: 

'turtle- re sp := write-resp U {p} 



Internal wq-collected(g)^ 
Precondition: 

status = s 
■ivrite-resp ~D q 
Effect: 

status := c 



Output response(:r, act:) 
Precondition: 
status = c 

Effect: 

status := r 

Output sendfm)^ p 
Precondition: 

m = head(req-buffer p ) 
Effect: 

delete head of req-buffei 



Fig. 5. Transitions of writer C x , x £ <D™ for ABD. 



Signature: 

Input: Output: 

receive (m.)^, s g 0, m G {r, W } U (Tag X V) send(m)p ;;c , i G O , p g /?. , m G {acfc} U (Tag X V) 

State: 

ua/ G V, initially v 

tag G Tag, initially (0,i ) 

For each x G O : resp- buffer x G sego/({acfc} UAA^° U (Tag X V)), initially A 



Transitions: 

Input receivefrja; p Input receive(t, v) x p 

Effect: Effect: 

append (val, tag) to resp-buffer x if t > tag then 

tag := t 

vaZ := v 



Input rece\ve(w) x ^ p 
Effect: 

append {tag . sn) to resp-buffer^ 



append (acfc) to resp-buffei 



Output send(Tn) P;X 
Precondition: 

resp-buffer x =£ X 

in = head{resp-buffer x ) 
Effect: 

delete head of resp-buffer x 



Fig. 6. Replica automaton R p , p £ P for ABD 



Among these variables, the most interesting one is min-tag which is used to 
keep track of the lowest possible tag that could ever be determined by a client at 
the end of the read phase. At the beginning and before any replica has responded, 
min-tag is the smallest tag among the maximum tags carried by replicas in every 
read quorum. As the client is progressing through the read phase it might get a 
response from a replica whose tag is bigger than the current value of min-tag. 
In this case, the definition of min-tag ensures that min-tag is assigned to that 
higher value. Finally, upon completion of the read phase, the value of min-tag 
is fixed to be the maximum tag received during the phase. The simulation proof 
relies on the following key property of min-tag: 

Lemma 4. For each x G O, min-tag(x) is non-decreasing. 



The simulation mapping from the states of ABD to the states of the P0- 
Machine appears in Figure 8. The first four components of the mapping are 



pending = {x G O : status x > p} 

ordered = {x G O : status;,; > s} 

completed = {x G : sfotusj. > c} 

responded = {a: G O : status^ > r} 

For r G O r : last-writes (r) = {w G £>iu l~l oi-dei-ed : s.tag w = s.tag r } 

For x G O, p G P: 



ft, if Bv G V : <v,t) 
x,p) = I (sn,x.id), if (an) e 
^ tagp, otherwise 

f max[tag x ,min 
^ tag x , otherwis> 



G resp-buffer p x U cfiannei P)I 
G resp-buffer px U cfiannelp^ 



£Q r max{neui-tag(x, p) : p £ Q \ read-resp^ }] , 
If VQ £ ffir, read-resp 2 Q 



Fig. 7. Derived variables for the ABD automaton 



straightforward: All the operations that have ever been requested (indicated by 
status > idle) are mapped to vertices; the operations that have completed the 
read phase and acquired final tags (indicated by status > p) are mapped to 
ordered; and the operations that have responded (indicated by status > c) are 
mapped to responded. 

The set of edges consists only of edges among operations that have completed 
their read phases (8.7). The edges among these operations arc determined by 
their tag order and type. Specifically, any two writes x and y, such that tag x < 
tag y , are connected by edge (x,y) (8.8); and each read x and write y such 
that tag x = tag y , are connected through edge (y,x) (8.9). To maintain the 
mapping for edges, each rq-collected(x) for x £ O w is simulated by a sequence of 
add-edge(y, x) for each ordered write operation y such that tag y < tag x , followed 
by order(x); and each rq-collected(x) for x € O w is simulated by a sequence of 
add-edge(j/, x) for each ordered operation y such that tag y = tag x . No actions 
involving unordered operations (i.e., the operations with status < s) result in 
adding new edges. 



/ is the relation over states(PO - Machine) X states(ABD) such that each (s, u) £ / iff: 

1. u. requested = s .requested 

2. u. vertices = s. pending 

3. u. ordered = s .ordered 

4. u. completed = s. completed U\ 

^reO r ns.cor. 



s . last-writes (r) 



5. u. responded = s .responded 

6. For all x G u. vertices, if y G u.prec(x) : then s.tag y < s.min-tag(x) 

7. u.dag C s .ordered X s. ordered 

S. For all x, y £ O w <"1 u. ordered, if (x, y) G u.dag, then s.tag x < s.tag y 

9. For all x G O w H U. ordered and y G O r H u. ordered, ix, y) G u. edges iff s.tagx = s.tag y 

Fig. 8. Forward simulation from ABD to PO-Machinc 



The most interesting part of the proof is to show that order(x) becomes 
enabled once all the (y,x) edges have been added. For that we need to show 
that the tag acquired by x at the end of the read phase is at least as big as 
the tag of every operation that had completed before x began. Since at the 



end of the read phase, tag x = min-tag(x), the necessary enabling condition is 
provided by part 8.6 of the mapping that requires that for each y G prec(x), 
tag y < min-tag(x). 

To show that 8.6 is maintained throughout the read phase of x, request(.x) 
is simulated by the request(a;) action of the PO-Machine; and each receive is 
simulated by the empty sequence. Since at the time x is invoked, the tag of every 
y G prec[x) has been stored at a write quorum of replicas, and because every pair 
of write and read quorums intersects, minggQ,. max pe Q{ta<7 p } > tag y . Hence, 8.6 
is preserved by request(.x). Finally since min-tag{x) is non-decreasing (Lemma 4) 
and prec(x) is not affected by any action except request, 8.6 is preserved by 
receive. Hence, by the end of the read phase of x, for each y £ prec(x), tag y < 
min-tag{x) as required. 

We argued informally that the mapping in Figure 8 is a forward simulation 
from ABD to the PO-Machine. A detailed proof appears in [6]. 

Lemma 5. The mapping in Figure 8 is a forward simulation from ABD to the 
PO-Machine. 

Since by Theorem 1, each trace of the PO-Machine satisfies atomicity, the 
same is true for every trace of ABD : 

Theorem 2. Each trace of ABD satisfies atomicity. 

Automated Tools Support: We have used the TIOA to PVS translator and TAME 
library [2] to generate descriptions of the PO-Machine and the ABD algorithm 
in the language of the Prototype Verification System (PVS) [26] . We used PVS 
to substantially increase the level of detail and assurance of some of our previous 
hand proofs. In fact, we discovered several gaps and bugs in our hand proofs. 
Automatic translation enabled us to easily tweak the simulation relations and 
rerun the proof scripts. We also used the IOA code generator tool [31,30] to 
compile the verified ABD automaton into an executable Java code. This way, 
a single formal representation of the ABD algorithm was used for specification, 
verification, and execution. 

6 Timed ABD 

In this section, we present an optimized version of the ABD protocol, called 
Timed-ABD, that takes advantage of perfectly synchronized clocks at the writers 
to eliminate the read phase of the write implementation (see [8]). 

The Timed-ABD is the composition of the following timed automata: the 
replica and reader client automata in Figures 6 and 4 respectively augmented 
with arbitrary trajectories that keep their state unchanged; and the writer client 
automata whose code appears in Figure 9. To model synchronized clocks, each 
writer maintains a local variable clock whose trajectory is d(clock) = 1 (i.e., the 
clock value grows continuously, at the same rate as the real time). 

The writer algorithm is as follows: To write a value, the writer first takes its 
current clock reading, and then delays its execution until its clock exceeds the 



Signature: 



Input: 

requestfa;)^ 

receive ( m.) p . a- , p e P , m e {ack} U A/*^ 

State: 

clock G TL, initially 

Discrete req-time £ TZ, initially 

status g Phase, initially idie 



Internal: 

wq-collected( (|) ;r: q 6 Q (1 



Output: 

response(:c, v), 
send(?n)cc,pi 



V £ {acfc} 

, G {w} U (Tag X V) 



Tra 



sitic 



Input request(a:) 
Effect: 

status := p 

req-time : = clock 

Internal order^ 
Precondition: 

clock > req-time 

status = p 
Effect: 

status := s 
for each p £ P: 

append {tag, x.val) to req-buffer^, 



Trajectories: 

evolve 

d(clock) = 1 
All the other state 



tag G Tag, initially (0,X.id) 

write-resp C P. initilly empty 

for each p £ P: req-buffer p G se«o/({tu} U (Tag X V)), 

initially A 



Input receive(acfc)p ^ 
Effect: 

write-resp := write-resp U {p} 

Internal wq-collectedfq)^ 
Precondition: 

status = s 

write-resp D g 
Effect: 

status := c 

Output response(a;, act:) 
Precondition: 

status = c 
Effect: 

status := r 

Output sendfm)^ p 
Precondition: 

?7i = head(req-buffer p ) 
Effect: 

delete head of req-buffer— 



Fig. 9. Writer client C x , a; £ 0„ for Timed- ABD 



initial reading. The second clock reading is used as the tag with which the client 
performs the write phase. 

The simulation mapping from the states of Timed-ABD to the states of 
Timed-PO (i.e., the PO-Machine augmented with arbitrary trajectories that do 
not change its state) appears in Figure 10. To see that the mapping is preserved, 
we observe that a write operation becomes ordered once it is verified that a 
non-zero amount of time has elapsed since it was requested. We therefore, simu- 
late each Timed-ABD trajectory corresponding to a non-zero time interval by a 
trajectory of Timed-PO of the same length, followed by a sequence of add-edge 
actions, followed by order. The rest of the simulation proof is straightforward 
(see [6] for details). 



/ is the relatii 



states {Timed- ABD) X states(Timed-PO) such that (s,u) G / iff: 



1-5: Identical to 1-5 in Figure 8 

6: For all x G u. vertices C\ O r , if y G u.prec(x), then s.tag y < s.min-tag(x) 

7-8: Identical to 7-8 in Figure 8 

9: For all x G u. vertices C\ O w , if y G u.prec(x), then s.tag y .sn < s.req-time x 

10: For all x G (u. vertices - u. ordered) D O w , y G u .ordered n O w , if s.tag y .sn < s.clock x , then (y, x) G 



Fig. 10. Forward simulation from Timed-ABD to Timcd-PO 



7 Other Algorithms 

We discuss briefly how to prove atomicity of the unbounded multi-writer/multi- 
reader register construction of Vitanyi and Awerbuch [36] (referred to henceforth 
as VA), and of a general atomic object implementation based on the replicated 
state machine algorithm of Lamport [15] (referred to henceforth as RSM). 

First, we observe that VA can be recast as a special case of ABD with the 
write quorums being the rows and the read quorums being the collumns of the 
matrix. Therefore, the simulation proof of VA is almost identical to that of ABD. 
In particular, it is easy to see that the simulation from ABD to PO-Machine in 
Figure 8 is also a forward simulation from VA to PO-Machinc. 

To prove atomicity of RSM, we use a simplified version of the PO-Machinc, 
called TO-Machine. The TO-Machine constructs a single total order of all the 
requested operations. In particular, every operation becomes ordered only after 
it is ordered relative to all the other ordered operations. The TO-Machine is 
parameterized by the emulated object sequential specification and initial state 
which are used to compute responses. The simulation proof is based on the ob- 
servation that in RSM, an operation x becomes ordered once the local timestamp 
at each replica becomes greater than that of x. The full proof appears in [6]. 

8 Conclusions and Future Work 

Our work with four algorithms so far suggests to us that our PO-Machine (or 
small variants) may be general enough to capture many of the existing atomic 
register algorithms. We plan to use these methods to study a wider variety of al- 
gorithms, such as bounded-timestamp-based constructions (see e.g., [34]), whose 
proofs have been notoriously difficult and bug-prone. An interesting challenge 
will be to extend our framework to capture implementations that are not ex- 
plicitly based on timestamps, for example, the construction that creates atomic 
bits from safe bits [32]. Another interesting direction deals with adapting the 
PO-Machine to capture weaker register semantics, such as safe registers, reg- 
ular registers (including the multi- writer regular registers of Welch [29]), and 
sequentially consistent registers. There is an increased recent interest in these 
semantics as they capture the guarantees provided by many Byzantine-resilient 
storage systems [24,25, 1] based on Byzantine quorums [24]. 

Yet another interesting application domain for our techniques is the verifi- 
cation of multi-threaded programs based on lock-free synchronization primitives 
(such as CAS, LL/SC, etc.). This area has recently been receiving an increased 
attention due to the growing popularity of multi-processor computing platforms, 
and the introduction of lock-free synchronization primitives into the Java con- 
currency package. 

Finally, we are interested in identifying common patterns behind many di- 
verse implementations of atomic objects. This will make it easier to understand 
and compare different algorithms. We expect that such patterns should be ex- 
pressible in terms of common specification automata (e.g., a unified version of 
the PO- and TO-Machines). 
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